AITopics | McKenzie County

Collaborating Authors

McKenzie County

Minimum Wasserstein distance estimator under covariate shift: closed-form, super-efficiency and irregularity

arXiv.org Machine LearningJan-13-2026

Covariate shift arises when covariate distributions differ between source and target populations while the conditional distribution of the response remains invariant, and it underlies problems in missing data and causal inference. We propose a minimum Wasserstein distance estimation framework for inference under covariate shift that avoids explicit modeling of outcome regressions or importance weights. The resulting W-estimator admits a closed-form expression and is numerically equivalent to the classical 1-nearest neighbor estimator, yielding a new optimal transport interpretation of nearest neighbor methods. We establish root-$n$ asymptotic normality and show that the estimator is not asymptotically linear, leading to super-efficiency relative to the semiparametric efficient estimator under covariate shift in certain regimes, and uniformly in missing data problems. Numerical simulations, along with an analysis of a rainfall dataset, underscore the exceptional performance of our W-estimator.

artificial intelligence, estimator, machine learning, (16 more...)

arXiv.org Machine Learning

2601.07282

Country:

Asia > Bangladesh (0.04)
North America > United States > New York (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
(3 more...)

Genre: Research Report > New Finding (0.67)

Industry: Education (0.67)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.70)

Add feedback

5c7465e4a236b56226ca221f3d16bf2b-Paper-Conference.pdf

Neural Information Processing SystemsNov-20-2025, 12:38:02 GMT

Text-to-Image (T2I) has witnessed significant advancements, demonstrating superior performance for various generative tasks.

artificial intelligence, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country:

South America (0.04)
Oceania > New Zealand (0.04)
Oceania > Micronesia (0.04)
(15 more...)

Genre: Research Report > Experimental Study (0.93)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Media (0.67)
Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Online Consistency of the Nearest Neighbor Rule

Neural Information Processing SystemsNov-19-2025, 16:26:27 GMT

In the realizable online setting, a learner is tasked with making predictions for a stream of instances, where the correct answer is revealed after each prediction. A learning rule is online consistent if its mistake rate eventually vanishes. The nearest neighbor rule (Fix and Hodges, 1951) is a fundamental prediction strategy, but it is only known to be consistent under strong statistical or geometric assumptions--the instances come i.i.d. or the label classes are well-separated. We prove online consistency for all measurable functions in doubling metric spaces under the mild assumption that the instances are generated by a process that is uniformly absolutely continuous with respect to a finite, upper doubling measure.

artificial intelligence, machine learning, nearest neighbor rule, (15 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
North America > United States > California > San Diego County > La Jolla (0.04)
North America > United States > North Dakota > McKenzie County (0.04)

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (0.66)

Add feedback

An adaptive nearest neighbor rule for classification

Akshay Balsubramani, Sanjoy Dasgupta, yoav Freund, Shay Moran

Neural Information Processing SystemsNov-18-2025, 02:14:58 GMT

We introduce an adaptive nearest neighbor classification rule.

artificial intelligence, convergence, machine learning, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > North Dakota > McKenzie County (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > Canada (0.04)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.50)

Add feedback

4fd5aadb85a00525415e3733cb96ed68-Paper.pdf

Neural Information Processing SystemsNov-16-2025, 09:37:31 GMT

algorithm, artificial intelligence, machine learning, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > Texas (0.04)
North America > United States > North Dakota > McKenzie County (0.04)
(2 more...)

Genre: Research Report (0.68)

Industry: Health & Medicine > Therapeutic Area > Oncology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.47)

Add feedback

Overcoming the curse of dimensionality with Laplacian regularization in semi-supervised learning

Neural Information Processing SystemsNov-16-2025, 07:43:30 GMT

This induces to the use of Laplacian regularization.

artificial intelligence, inductive learning, machine learning, (15 more...)

Neural Information Processing Systems

Country:

Europe > France > Île-de-France > Paris > Paris (0.04)
North America > United States > Texas (0.04)
North America > United States > North Dakota > McKenzie County (0.04)
Europe > Switzerland > Vaud > Lausanne (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.85)

Add feedback

Optimal Binary Classification Beyond Accuracy

Neural Information Processing SystemsNov-15-2025, 01:18:59 GMT

The vast majority of statistical theory on binary classification characterizes performance in terms of accuracy. However, accuracy is known in many cases to poorly reflect the practical consequences of classification error, most famously in imbalanced binary classification, where data are dominated by samples from one of two classes.

artificial intelligence, classifier, machine learning, (17 more...)

Neural Information Processing Systems

Country:

Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
North America > United States > Texas (0.04)
North America > United States > North Dakota > McKenzie County (0.04)
(3 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.95)

Add feedback

Unsupervised Document and Template Clustering using Multimodal Embeddings

Sampaio, Phillipe R., Maxcici, Helene

arXiv.org Artificial IntelligenceOct-28-2025

We study unsupervised clustering of documents at both the category and template levels using frozen multimodal encoders and classical clustering algorithms. We systematize a model-agnostic pipeline that (i) projects heterogeneous last-layer states from text-layout-vision encoders into token-type-aware document vectors and (ii) performs clustering with centroid- or density-based methods, including an HDBSCAN + $k$-NN assignment to eliminate unlabeled points. We evaluate eight encoders (text-only, layout-aware, vision-only, and vision-language) with $k$-Means, DBSCAN, HDBSCAN + $k$-NN, and BIRCH on five corpora spanning clean synthetic invoices, their heavily degraded print-and-scan counterparts, scanned receipts, and real identity and certificate documents. The study reveals modality-specific failure modes and a robustness-accuracy trade-off, with vision features nearly solving template discovery on clean pages while text dominates under covariate shift, and fused encoders offering the best balance. We detail a reproducible, oracle-free tuning protocol and the curated evaluation settings to guide future work on unsupervised document organization.

data mining, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2506.12116

Country: